Loading...
 

UTF8 conversion

Convert text data of a database to UTF8

From version 4.7.0 ClassiX works with Unicode. For string data UTF8 is encoded. This also applies to all text data of the database.
Therefore ClassiX 4.7.x cannot be started with older databases; the database has to be converted.
This task is performed by the cxgosuo.exe utility.
All functions are called via flag /8 with sub-parameters:

Flag Task performed Example Notes
/8 converts the entire database (i.e. the logical database, which may consist of several physical sub-databases) converts from segment 2 to the end recommended for relatively small databases
/8segment s1-s2 converts from segment s1 + 1 to s2 (inclusive) /8segment 799-1000 Segment 800 to 1000 are converted for large databases the conversion can be carried out in parallel in several sub-processes
/8segment s converts from segment s1 + 1 to the end /8segment 1000 converts the rest
/8set marks the database as Unicode-enabled now ClassiX 4.7.x can start with the database
/8reset shall withdraw this marking
/8verify checks whether all string data are UTF8 compliant Errors are written into the ClassiX logfile,
see note on error checking
/8verify s1-s2 controlled from segment s1 + 1 to s2 (inclusive) /8verify 799-1000 Segment 800 to 1000 are verified
/8verify s controlled from segment s1 + 1 to the end /8verify 1000 verifies the rest

Attention: Changing the character code makes database indexes via string fields unusable.
Affected indexes must be removed and rebuilt after conversion.

Deactivate affected indexes

  • manually select interactively or
  • deactivate all string related indices with method DeactivateAll of the Index Manager or
  • using the DeactivateSelected method of the Index Manager, deactivate all the indexes that refer to data fields for which strings have actually been converted.
    A statistic about this is at the end of the log file of the conversion.

Reactivate affected indexes

Affected indices must be built up at the latest before the control with /8verify.

Notes on error control

For the conversion all ClassiX objects are iterated in the database.
Verification, on the other hand, iterates over all character strings in the database.
This will include

  • the image data of the class CX_BITMAP (for format .bmp BM is always at the beginning, for .jpg the character string exif. → ignore these errors!
  • ignore the stream data of all COM objects → Ignore errors in the corresponding segments!
  • ignore the compressed bytes of the XML objects (CX_WORD_XML) → Ignore errors in the corresponding segments!


The "old" dictionary classes CX_INDEX, CX_INDEX_CI, CX_DICTIONARY and CX_DICTIONARY_CI are no longer supported as of version 4.7.0.
These objects are not converted and are therefore unusable after UTF8 conversion!

CX_MAX_AS_HANDLING

If problems with the address space occur during the conversion of certain segments, these should be converted separately. If this is still not sufficient, the environment variable CX_MAX_AS_HANDLING can be set to TRUE to activate maximum address space handling.